Penalized regression combining the L1 norm and a correlation based penalty

نویسندگان

  • Mohammed El Anbari
  • Abdallah Mkhadri
چکیده

Variable selection in linear regression can be challenging, particularly in situations where a large number of predictors is available with possibly high correlations , such as gene expression data. In this paper we propose a new method called the elastic corr-net to simultaneously select variables and encourage a grouping effect where strongly correlated predictors tend to be in or out of the model together. The method is based on penalized least squares with a penalty function that, like the Lasso penalty, shrinks some coefficients to exactly zero. Additionally, this penalty contains a term which explicitly links strength of penalization to the correlation between predictors. A detailed simulation study in small and high dimensional settings is performed, which illustrates the advantages of our approach in relation to several other possible methods. Finally, we apply the methodology to three real data sets. The key contribution of the elastic corr-net is the identification of setting where the elastic net fails to product good results: in terms of prediction accuracy and estimation error, our empirical study suggests that the elastic corr-net is more adapted than the elastic-net to situations where p ≤ n (the number of variables is less or equal to the sample size). if p n, our method remains competitive and also allows the selection of more than n variables in a new way. La régression pénalisée combinant la norme L 1 et une pénalité tenant compte des corrélations entre les variables Résumé : La sélection de variables peutêtre difficile, en particulier dans les situations o` u un grand nombre de variables explicatives est disponible, avec la présence possible de corrélationsélevées comme dans le cas des données d'expression génétique. Dans cet article, nous proposons une nouvelle méthode de régression linéaire pénalisée, appelée l'elastic corr-net, pour simultanément estimer les paramètres inconnus et sélectionner les variables importantes. De plus, elle encourage un effet de groupe: les variables fortement corrélées ont tendancè a ˆ etre toutes incluses ou toutes exclues du modèle. La méthode est fondée sur les moindres carrés pénalisés avec une pénalité qui, comme la pénalité L 1 , rétrécit certains coefficients exactement vers zéro. En outre, cette pénalité contient un terme qui lie explicitement la force de pénalisationà la corrélation entre les variables explicatives. Pour montrer les avantages de notre approche par rapport aux méthodes les plus concurrentes, uné etude détaillée de simulation est réalisée en moyenne et grande dimension. Enfin, nous …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Penalized Estimators in Cox Regression Model

The proportional hazard Cox regression models play a key role in analyzing censored survival data. We use penalized methods in high dimensional scenarios to achieve more efficient models. This article reviews the penalized Cox regression for some frequently used penalty functions. Analysis of medical data namely ”mgus2” confirms the penalized Cox regression performs better than the cox regressi...

متن کامل

Increasing Feature Selection Accuracy for L1 Regularized Linear Models

L1 (also referred to as the 1-norm or Lasso) penalty based formulations have been shown to be effective in problem domains when noisy features are present. However, the L1 penalty does not give favorable asymptotic properties with respect to feature selection, and has been shown to be inconsistent as a feature selection estimator; e.g. when noisy features are correlated with the relevant featur...

متن کامل

Comparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data

Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...

متن کامل

0 Sparse Inverse Covariance Estimation

Recently, there has been focus on penalized loglikelihood covariance estimation for sparse inverse covariance (precision) matrices. The penalty is responsible for inducing sparsity, and a very common choice is the convex l1 norm. However, the best estimator performance is not always achieved with this penalty. The most natural sparsity promoting “norm” is the non-convex l0 penalty but its lack ...

متن کامل

Reweighted l1-norm Penalized LMS for Sparse Channel Estimation and Its Analysis

A new reweighted l1-norm penalized least mean square (LMS) algorithm for sparse channel estimation is proposed and studied in this paper. Since standard LMS algorithm does not take into account the sparsity information about the channel impulse response (CIR), sparsity-aware modifications of the LMS algorithm aim at outperforming the standard LMS by introducing a penalty term to the standard LM...

متن کامل

Use of Two Smoothing Parameters in Penalized Spline Estimator for Bi-variate Predictor Non-parametric Regression Model

Penalized spline criteria involve the function of goodness of fit and penalty, which in the penalty function contains smoothing parameters. It serves to control the smoothness of the curve that works simultaneously with point knots and spline degree. The regression function with two predictors in the non-parametric model will have two different non-parametric regression functions. Therefore, we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011